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ABSTRACT 

In  this  paper  we  present  alternative  formulations  of  the 
algorithms  of  Chen  and  Kuck  [1]  in  such  a  way  as  to  make  them  easier 
to  follow  and  implement.  We  also  give  a  detailed  error  analysis, 
showing  that  if  x  is  the  computed  solution  of  the  triangular  system 

Lx  =  f,  then  it  satisfies  the  equation  (L  +  6L)x  =  f  where  ||6L||  = 

2  2    I  I  I 

0(n  log  n)  £  K  (L)||L||.  Here  k(L)  is  the  condition  number  of  L, 

I  I'll  denotes  the  °°-norm  and  e  is  the  unit  roundoff. 


1.  Introduction 


Chen  and  Kuck  [1]  proposed  a  parallel  algorithm  for  solving 
the  linear  system  of  equations  Lx  =  f ,  where  L  is  a  unit  lower  tri- 
angular  matrix  of  order  n,  in  O(log  n)  steps,   using  ^p- +  0(n  )  processors 


They  also  showed  that  if  L  is  of  bandwidth  (m+1),  i.e.,  I..   =  0  for 

3  2 
i  -  j  >  m,  the  time  required  is  O(log  m  log  n)  using  j  m  n  +  0(mn) 

processors.  The  problem  has  also  been  investigated  independently  by 

Orcutt  [2].  Unfortunately,  their  derivations  tend  to  obscure  the  fact  that 

both  algorithms  are  rather  simple.  Our  objective  here  is  to  present  these 

algorithms  in  such  a  way  as  to  make  them  easier  to  follow  and  implement. 

In  fact,  using  this  new  formulation  we  are  able  to  establish  that  the 

1  2 
number  of  processors  required  for  the  banded  case  is  y  m  n  +  0(mn)  instead 

3  2 
of  x  "1  "  "*"  0(mn).  Furthermore,  we  present  an  error  analysis  of  the  general 

algorithm  showing  that  if  x  is  the  computed  solution  then  (L  +  6L  )  x  =  f, 

where  6L  is  bounded  by,  ||6L  ||  =  a(n)  ek   (L)  ||L||.  Here,  ||'||  stands 

r  r 

2 
for  the  oo-norm,  a(n)  =  0(n  log  n),  e  is  the  unit  roundoff,  k(L)  is  the 

2 
condition  number  of  L,  and  we  assume  that  terms  of  0(e  )  are  negligible. 

This  bound,  however,  can  be  yery   large  compared  to  that  of  the  sequential 

algorithm,  Wilkinson  [3],  where  ||6L  ||  ^ne  ||L||  . 

Throughout  this  paper  we  assume  that  any  number  of  processors 
can  be  used  at  any  time,  but  we  will  give  bounds  on  this  number.  Each 
processor  can  perform  any  of  the  four  arithmetic  operations  in  one  time 
step.  Furthermore,  if  p  processors  are  used  we  denote  the  computation 
time  by  T  . 


Throughout  this  paper  log  n  =  logpn. 


Without  loss  of  generality,  we  assume  that  diag(L)  is  different 
from  the  identity  and  that  n  =  2^  and  m  =  2^  for  positive  integers  y  and  v. 
where  y  >  4. 


2.  Algorithm  I 

The  inverse  of  L  can  be  expressed  as  [4], 


L"'  =  \  ^-^  \-2 


'1 


where  M^.  =  ( I  -  t^.£^.  e^  in  which  £^  =  (0,...,0,il    ,£ 


(2.1) 
,...,il       ).     The 


and  T^  =  1/S...,   i  =  1,   2,    ...,   n  -  1,  and  M^  =  diag(l , . . .  ,1 ,1/Jl^^) .     The 


solution  X  is  then  given  by 


X  =  M^  \.,  \.2 


M^f  . 


(2.2) 


Evaluating  the  product  (M„_-|  •••  M-if)  in  parallel  requires  y  =  log  n  stages 
At  each  stage  ( j  +  1 )  we  evaluate  the  matrices  M.'^"*"^  =  M^Ji  ^oi         ' 


i  =  1,  2,  ..., 


-r^  -  1,  and  the  vectors  f^J"^^^  =  m/^^  f^"^^  ,  where 


(0)  = 
1      1 


.(0) 


M/"'  =  M,,  f^"^  E  f.  Each  matrix  M.^^^   is  of  the  form. 


M. 


(J) 


,(J) 


(j) 


(j) 


"^i 


(j) 


(2.3) 


where  L^.  ^^^   is  a  lower  triangular  matrix  of  oi^der  2"^,  q.^^^  =  i2'^   -  1, 


^• 


and  r.^J^  =  (n  +  1)  -  (i  +  1)2'^'.  Clearly,  L.^°^  e  ^  ,  and  -S. 

ii 

r  -r  (j)   1 

square  matrix  of  order  2"^  .  Thus, 


(j) 


(0) 


,  where  T.^"^^  is  a 


(j+1)  _ 


4i 


(j) 


r(j)  T  (j)     :i3) 

■"Zi+l    '2i  "-21+1 


(2.4) 


and 


(j+1)  - 


s(j)     T     (J^   +  II     (j) 
^2i+l    '2i         ^  ^Zi 


■(j) 
'2i+l 


(2.5) 


Furthermore,  if  f^-^^  =  (g/^^  ,  g^J^  ,  g^j)  )  whe 


(j) 


I-]    J  ^2    »  y3    ^  wriere  the  vectors  g-, 

and  g^  "^  are  of  orders  2'^'  and  2'^,  respectively.  Then  the  three  subvectors 
constituting  f^J"*"'^  are  given  by, 
f  (J^l)  -  G  (j) 


f  (j+1)  _  r  (j)  ^  (j) 

f  (j+1)  .  s  (j)  n  (j)  +a  (j) 
T3     -  b-i    g2   +  §3 


(2.6) 


where  the  first  2'^'*'  components  of  f^^'^^'   constitute  the  elements  ^., 

(i  =  1 ,2, . . .  ,2"'  ),  of  the  solution  vector  x.  Forming  the  matrices  M. 
requires  1  step  with  -^""^  processors.  The  products  in  (2.4)--(2.6) 
can  be  evaluated  simultaneously  in  1  +  log  2"^  =  (1  +  j)  steps.  The  additions 
in  (2.5)  and  the  last  of  (2.6)  are  evaluated  in  one  additional  step. 
Hence,  the  time  required  for  each  stage  j  +  1  is  t  ^.    -  (2  +  j)  steps. 

Let  s  =  2  ,  then  the  number  of  processors  for  evaluating  the 
products  L^J|^  J^.^^^   and  S^>j|^  T2/-^'^  in  (2.4)  and  (2.5)  is  given  by 

P'  =  2  s  (s  -  1)  and  p"  =  s  r^'^'^  ' ,   respectively.  Since  the  number  of 
processors  required  for  evaluating  the  products  in  (2.4)--(2.6)  is  more  than 


that  required  for  evaluating  the  additions,  evaluating  M^.^  '   requires 

p!"^"^^^  =  p'  +  p"  =  ^  (2n  +  l)s^  -  -^  {4i  +  3)s^  processors, 

(i  =  1,2,...,^-  1).  From  (2.6),  we  see  that  evaluating  f^^'^^' 

requires  Pq     =  2-  (2n  +  1  )s  -  -^  s  processors.  Consequently,  stage  j  +  1 

requires, 

— -1 
p(J-l)  =  ',^    (J>1) 
k=0   "^ 


=  |s3-i 


(5n  +  12)s^  +  1  (n2  +  7n  +  5)5^ 


The  total  time  for  solving  the  system  is,  therefore,  given  by 

y-1 
T  =  2  +  z  (2  +  j)  =  2  +  y(y  +  3)/2  (2.7) 

P      j=0 

steps,  where  we  added  one  further  step  for  M  f^^\  using  no  more  than 

p  =  max[   max  {p^'^'^^},  n(n-l)/2] 
j=0,l...,y-l 

processors.  For  n  ^  16  the  maximum  is  achieved  for  s  =  n/8  yielding 

P  =  ^  (||n^  +  lln  +  12)  (2.8) 

or  p  =  gg  +  0(n  )  processors.  For  n  ^256,  p  1  gx- 

Hence,  we  have  established  the  following  theorem. 

Theorem  2.1 


The  triangular  system  of  equations  Lx  =  f,  where  L  is  a  lower 

1    ?    3 
triangular  matrix  of  order  n,  can  be  solved  in  T  =  p-  log  n  +  j  log  n  +  2  steps 

153     2  

using  no  more  than  p  =  TqoT  ^     "*■  0(n  )  processors.  / / 


Lemma  2.2 


Let  L  be  defined  as  in  Theorem  2.1.  Then  L"  can  be  obtained 


in  the  same  time  required  for  solving  the  system  Lx  =  f,  using  no  more 
than  (21n'^  +  60n^)/128  processors.  /JZ7 

Proof: 

The  time  bound  is  trivially  established  by  Theorem  2.1.  However, 
the  maximum  number  of  processors  used  occurs  at  s  =  n/4,  rather  than  n/8, 
yielding  the  above  number  of  processors.  In  fact,  we  can  show  that  for 
h  >_  n  right-hand  sides  the  maximum  number  of  processors  is  given  by, 

P  ~  128     ^~32~^     ^8~^  ^        ' 

processors. 

We  state,  without  proof,  the  following  lemma  regarding  the 

number  of  operations,  (number  of  nodes  in  the  computational  graph),  Q 

required  for  the  above  algorithm. 

Lemma  2.3 

?    3      CO      TO 

The  algorithm  of  Theorem  2.1  requires  Q  =2T"  """b"  — fi"'*'^ 
operations,  with  a  redundancy  R  =  Q^/T-i  =  0(n). 

Now  a  lemma  in  [5]  states  that,  if  we  have  p  <  p  processors 
then  the  time  required  for  solving  the  triangular  system  is  given  by, 

T;  <  T  +  (Qn  -  T,)/P  (2.10) 

p  —  p    p    p 

Let  p  =  n^/K,  then 

Tp  <  Tp  +  |y  K  (1  +  6)  (2.11) 

where  6  <  (9/n).  For  n  >_  256 

Tp  <  Tp  +  O.IOK  (2.12) 

2  ? 

Hence,  as  long  as  K  =  O(log  n),  T  remains  O(log  n). 


3.  Algorithm  II 

In  this  section  we  present  an  algorithm  that  requires  the  same 

time  as  Algorithm  I  but  roughly  twice  the  number  of  processors.  The 

advantage  of  this  algorithm  is  that  it  can  be  adapted  to  band  systems 


l(0)  .  DL  and  f^^)  . 


Df  where  D  =  diag  (i,^^  i'^^,...,  z^J .     We  form 


nn' 


matrices  D^^^,  (j  =  0,  1 ,  . . . ,  y  -  1 ) ,  such  that  if 

lCj+D  =  d(J)  l(J),  f^J+T)  =  D^J)  f^J"^ 
then  L^'^^  =  I  and  x  =  f^^^  Each  matrix  L^^*^  is  of  the  form 


Let 


(3.1) 


(j)  _ 


(j) 


"21 


G  ^^'^    G  ^J^    I 
^31      '"32      ^: 


.(j) 


.  h^ 


,(J) 
f2 


3(j) 
s's"' 


where  s  =  2"^',  and  g|9^  =  ^^-j/^.^--  Therefore,  (D^^^'V^  =  diag  (L^^"^^ 
L2^J^...,L^^J^),  where. 


(j)  - 


p(j)  T 

^2i,2i-l    ^ 


(3.2) 


Thus,  for  stage  (j+1)  we  have, 


^i^ '  = 


-G 


(j) 
2i,2i-l 


p(j) 
''2i-l,2k-l 


.(j) 
'2i,2k-l 


«(j) 
'2i-l,2k 

.(j) 
'2i,2k 


(3.3) 


for  i.k  +  1  =  2,  3,  ...,  n/2s,  and 


(j+1)  _ 


-G 


(j) 
2i,2i-l 


p(j) 

.(j) 
2i  J 


(3.4) 


for  i  =  1,  2,  ...,  n/2s.  We  can  simultaneously  evaluate  the  products 

^2i,2i-l  ^2i-l,2k-T  ^2i,2i-l  ^2i-l,2k'  ^"^  ^2i,2i-l  ^2i-l  ^" 
t'  =  1  +  log  s  =  (1  +  j)  steps  using  p'  =  s  (2s  +  1)  processors.  Performing 


the  addition  of  the  two  pairs  of  matrices  in  (3.3)  and  the  pair  of  vectors 
in  (3.4)  requires  t"  =  1  step  using  p"  =  s(2s  +1)  <  p'  processors.  Since 
we  have  n(n  -  2s)/8s  matrices  G.j?  ,  and  n/2s  vectors  f.  "^   ,  then  each 


stage 


j  +  1  requires  t^"^"^^  =  t'  +  t"  =  (2  +  j)  steps  using  p^^^^  = 


7^  sn  -  J  s  n  +  sn  processors.  Forming  L^  ^  and  f^  '  requires  1  step 
using  n(n+l)/2  processors.  Hence,  the  total  time  for  solving  the  linear 
system  Lx  =  f  is  given  by. 


y-1 

T   =  1  +   E   (2  +  j)  =  1  +  y(y  +  3)/2 


(3.5) 


j=0 


.(J+1) 


steps  using  p  =  max  [maxfp^""  ^},n(n+l  )/2] .  For  n  >  16  the  maximum  occurs 


j=0,l,...,y-l 


at  s  =  n/4  yielding 


3    2 
n  ,  n 

32   4 


P  =  %-  +  1- 


(3.6) 


processors.  Now  we  introduce  the  following  Theorem  for  solving  banded 
lower  triangular  systems. 


Theorem  3.1 


Let  L  be  a  banded  lower  triangular  matrix  of  order  n  and 
bandwidth  (m  +  1),  (£.  =0  for  i  -  k  >  m) .  Then  the  system  Lx  =  f  can 


be 


1    2 
solved  in  T  =  (2  +  log  m)  log  n  -  2-(log  m  +  log  m)  steps  using 


1„2 


no  more  than  p  =  ^m  n  +  0(mn)  processors  / I 


Proof: 


The  matrix  L  and  the  vector  f  can  be  written  in  the  form, 


L  = 


^1     4 


^2     4 


n_i      x\_ 
m~      m 


,   f  = 


(3.7) 


where  L^.  and  R^  are  mxm  lower  triangular  and  upper  triangular  matrices, 
respectively.  Premul tiplying  both  sides  of  Lx  =  f  by  the  matrix 
D  =  diag  [l:1] 


we  obtain  the  system  L^^  x  =  f^^  where. 


(0)  _ 


m 


.(0) 


m 


:(0) 


m 


g(o)   I 

n_1    m 

m~ 


(3.8) 


and 


10 


L.. 


'  =  f 

[^\°\ 

^(0)" 

= 

(3.9) 


^i-1 


f. 
1 


i  =  2,3. 


From  Theorem  2.1  and  Lemma  2.2  we  can  show  that  solving  the  systems  in 

(0)   1    2    3 
requires  t^  '   =  2  log  m  +  2"  log  m  +  2  steps,  and  by  (2.9),  using 

(0)    21  2 
P   ^  itd  "^  ^   "•■  0(mn)  processors. 


(3.9) 


I 


128 


Now  we  follow  the  above  algorithm  and  form  the  sequence 


l(J+1)  =  D<J'  l'J),  and  f'J^l'  =  d'J>  f'J'  for  j  =  0.  1.  .... 
Each  matrix  L^"^^  is  of  the  form, 


log 


(J)  _ 


:(j) 


:(j) 


G^J^      I 

r 


(3.10) 


.  oJ 


.(j) 


where  r  =  2^.m.  Therefore,  D^^^  =  diag  [  (l('^'^)"M    (i  =  1,  2, 
which 


...p  in 


I. 


(j)  _ 


;(j) 

'2i-l 


(3.11) 


Hence,  for  the  stage  j  +  1 ,  we  have 


n 


g(J^i)  = 


0    -G 


'2i 

(J) 
2i+l 


'2i 


.  i  =  1,  2,  ...,  ^  -  1    (3.12) 


and 


(J+1)  - 


:(j) 

2i-l 


r;(j)     f(j)    .  f(j) 

■^2i-l  ^2i-l   ^2i 


,  i  =  1,  2,  ..., 


2r 


(3.13) 


(J) 


Observing  that  all  except  the  last  m  columns  of  each  matrix  G.^^'  are  zero, 

(j)  .(j) 


(j)  p(j) 
'2i+l  ^2i 


then  -Go^ifi  G\i     and  -G2^_i  "fpi-l  ^^"  ^^   evaluated  simultaneously  in 


1  2 

1  +  log  m  steps  using  p'  =  -p  m(m  +  1 )  n  -  rm   processors.  Finally,  one  more 

step  to  evaluate  fj"^  '   using  p"  =  r  processors.  Therefore,  t^"^    -  2  +  log  m 
steps  and  p^"^  '  =   max{p',p"}  =  ^j  m(m  +  l)n  -  rm   processors.  The  total 


time  is  thus  given  by, 

n_ 

•"  (i) 


log 

Z 

j=o 


=  y(2  +  u)  -  ^  (u  +  1) 


(3.14) 


steps,  where  y  =  log  n  and  u  =  log  m,  with  p  =  maxCp^-^M  processors. 

j 

For  m  =  n/2,  p  E  p^^,  otherwise 


=  .(1)  - 


P  =  P 


1  3 

2  m(m  +  l)n  -  m 


(3.15) 


processors. 


12 

Lemma  3.2 

The  algorithm  of  Theorem  3.1   requires  Q     =  m  n  log  ^  +  0(mn  log  5-) 

operations,  with  redundancy  R     =  Qq/T-.  =  0(m  log  5—). 

~      m^n 
Again,  we  can  show  that  for  p  =  -jt-    <  p  processors,  the  time 

required  for  the  algorithm  of  Theorem  3.1  becomes  [5], 

Tp  i  Tp  ^  2K  log  n_  (3.16) 

Hence,  provided  K  =  O(log  m) ,  T     remains  O(log  m  log  n). 


13 


4.     Error  Analysis 

We  present  an  error  analysis  of  the  algorithm  in  section  2.     Let 
denote  any  of  the  four  arithmetic  operations,  then  a  floating-point 
operation  satisfies, 

f£(d  *  b)  =  (a  *  b){l  +  6) 
where   |6|   <_  e,  e  is  the  unit  roundoff 

^1  J-t 


e  = 

,1-t 


2  S  (rounded  operations) 

(chopped  operations) 


in  which  3  is  the  radix  of  the  machine  and  t  is  the  length  of  the  fraction. 
At  each  stage  j  of  the  algorithm  (j  =  1,  2,  ...»  y),  where  y  =  log  n,  we 
perform  inner  products  of  vectors  u  and  v  the  maximum  order  of  which  is 
(1  +  2"^).  Performing  each  inner  product  in  log  (1  +  2"^)  =  (j  +  1)  steps, 
we  can  show  that, 

|f£(u^)  -  u*  V  I  1  (j  +  2)  £  lul^lvl  (4.2) 

where  |u|  denotes  the  vector  with  elements  |y.|,  0(e  )  terms  are  neglected 
here  and  below.  Without  loss  of  generality,  we  assume  that  ||L|I  £  1. 

If  M^.^"^^  =  fii(M^.^'^^)  ana  f^^^'^  =  fii(f^'^^),  then  the  algorithm  of  Theorem  2.1 

is  given  by, 

M  (J+1)  =  u(j)  m(J)  +  p  (J+1)  ,•  -  1  o      _n.  _  1 
^       ^2i+l  ^2i  "^  ^i      1-1.2,....,  -j^-   1 

f(j+l)  =^/J)  f(j)  +e(j^1)  (4.3) 

for  j  =  0,  1,   ...,  y  -  1,  where  eW     '  is  lower  triangular  and. 


14 


(J)||f(j) 


,(j+l)|  1  (J+2)  e  |M^ 


(4.4) 


Again,  |M|  denotes  the  matrix  with  elements  |y.p|  and  an  inequality  in 
matrix  or  vector  form  applies  separately  to  corresponding  elements  on  each 
side. 

We  introduce  the  following  lemma  in  order  to  establish  bounds 
on  the  norms  of  E^"^  ^  and  e^"^   . 


Lemma  4.1 


Suppose  L  = 


hi  ° 
'-21  '-22 


with  I  |L|  I  ;^  1 .  Then 


I  -1|  I   I  h  -111    ^1  ir-li  I   ill  -1 
I-??!  Ill''-  1 1 »  ^^^   ML  II  £  IJL 


(4.5) 


where  L  = 
Proof: 


hi   0 

hi  ^ 


L' 


Since, 
"-22  '-21  "-11 


-1 
22 


then,  it  is  clear  that  the  first  of  (4.5)  is  satisfied.  Also, 

I   0 


hi  ' 


•21 


Consequently, 


I 


=  L 


0   L 


-1 
22 


I  lIlLlI  ||L-^||  ll|L-^ 
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From  the  above  Lemma  we  see  that, 

ll^j^'^l  I  1  I  IL'""!!  (4.6) 

Also,  from  the  fact  that  f^"^^  =  mJ'^*"^^  f^-^'^^  (2.1),  and  (2.2)  we  obtain, 
f^-^"^  =  M    ....  M^M^f 

=  M"|  ...  M"\  M"\  M"^x  . 
2J     n-2  n-1  n 


Hence,  from  Lemma  4.1, 

(4.7) 


f^^^ll  <  IILM  Mx! 


£  I |X| I 

Consequently,  from  (4.4) 

||E,^^""^^^||  <  (j+2)(l+2'^*)  e  ||L""'||^ 

'  ~  (4.8) 

lle^"^'""^^!!  <.  (j+2)(l+2J")  e    \\l'\\    ||x|| 

For  a  complete  analysis  we  need  to  consider  the  structure  of 
eW'"^^\  From  (2.3)  -  (2.5)  we  see  that  the  first  t.^^  =  (2i+l)2^-l  rows 
of  E^"^  '  are  zero,  hence  ME.   ^  =  E.   '   for  i  <_  t..,.  Furthermore,  the 
last  (n  -  t..  J  columns  of  e!"^*"*"^^  are  zero,  thus  E^-^^^^^M.  =  E?"^"^^^ 

J  '  I  I  I         X*       I 

for  £  >_^^+y     Let  L(k,£)  =  ^W-^    ...  \\  and  y.^  =  i2<^'  -  1. 
We  verify  that 

M.(^'^   =  L"''(l,Yi+ij)(I  +  P^^^h   L(l.Yij)  (4.9) 

where 


/^'=   '^    '^^''.'"ui.v,.,,)e,(^)l-^i,.,.,) 


0  =  1  i-s  "K+I.S'       K  K.S  (4-10) 
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Unfortunately,  however,  we  cannot  take  advantage  of  the  above  properties 
of  eW  '  to  simplify  the  expression  under  the  summation  in  (4.10). 
It  is  trivial  to  show  that  (4.9)  holds  for  j  =  1.  From  (4.9)  and  the 
first  of  (4.3)  we  obtain 

Using  (4.10)  we  easily  show  that 

p(j  +  l)  =  p(j)+p(j)   +  Lfl  Np  (j  +  1)   '\.  .  ,  2^ 

2 
Neglecting  terms  of  0(e  )  we  establish  (4.9). 

From  the  second  equation  in  (4.3),  we  have, 

;("^l)  =  (M„S/^'  ...  m/1)  M/°')f  .  'z'  L-l(2^n)e<^)         (4.11) 

where  L"  (2n,n)  =  I.  Also,  from  (4.9) 

M^m/^^  ...  m/^^  m/°^  =  L-^(I  +  P) 

^■^   fi) 
where,  P  =  Z  P.  ^Ms  a  lower  triangular  matrix.  If  x  is  the  computed 

j=l  ' 
solution  then  (4.11)  reduces  to 

X  -  X  =  L'^Pf  +  e  (4.12) 


in  which 

y+1 

k=l 


e  =  I     L"^2^  n)e^''^  (4.13) 


Let  us  define  the  constants  a,  and  a^  by 


(4.14) 
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1  2 
a  =  T  n  log  n  +  O(nlog  n) 

a  =  2  n  log  n  +  0(n) 
Thus,  from  (4.7)  -  (4.10)  and  Lemma  4.1  we  have, 
IlL'^fll  <.a^  e    ||L'''||^||x|| 

I  r     -1     I  2  I 
||e||<_a2e|lL      ||     |Ix|| 

Hence,  from  (4.12) 

l|x  -  x||/||x||  <.a^  e  ||L"''||^  (4.15) 

Premulti plying  both  sides  of  (4.12)  by  L,  we  obtain 

Lx  -  f  =  Pf  +  Le  (4.16) 

From  the  fact  that  f  =  Lx,  (4.4),  (4.10),  and  (4.13),  we  have 

|Pf  +  Le|  <  |L| |x|  (4.17) 

where  L  is  a  lower  triangular  matrix.  Furthermore,  from  (4.8), 

||L||  <  a^  e  ||l"'||^  . 
Hence,  there  exists  a  perturbation  6L  such  that 

(L  +  6L)x  =  f  (4.18) 

where, 

||6L|  I  la-,  e  ||L"^||^   .  (4.19) 

If  we  remove  the  assumption  that  ||L||  £l,  then  (4.19)  becomes 

||6L||  <  a-|  e  k^(L)  1|L|  | 
in  which  k(L)  is  the  condition  number  of  L. 
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